This workshop offers an introduction to time series cross sectional (TSCS) data with a focus on applications in R. Some familiarity with time series cross sectional econometrics, as well as R, is assumed. We will cover some basic theoretical results, but this workshop is not intended to substitute for a more detailed theoretical treatment.
The workshop starts with some basic description of the structure of TSCS data and the risks of the pooled (OLS) estimator, then covers standard approaches to unit-level heterogeneity: the fixed effect (within) estimator, the random effect estimator. Interpretation and drawbacks of the fixed effect estimator are discussed, as well as a potential solution to the time invariant covariate problem via penalized maximum likelihood. We will discuss testing the key assumption behind the random effects model, and introduce the unifying “within-between” model.
The second section focuses on issues related to time. We start by considering the implications of a static model, then incorporate basic dynamics via the finite distributed lag and lagged dependent variable (LDV) specification. We briefly discuss tests for serial correlation in the model errors, as well as introduce the Anderson-Hsiao estimator for FE estimation with an LDV.
install.packages() and library() are common package downloads and loading. Here we’ll load some useful packages. pacman::pload() combines both of these in one function.
if (!require("pacman")) install.packages("pacman", repos = "https://cloud.r-project.org")
pacman::p_load("tidyverse", "data.table", "readstata13", "plm", "bife", "survival", "brglm2", "ivreg", "lmtest", "tseries", "plotly")
#setwd()
set.seed(1015)
Some clarification of terminology
A linear regression models a dependent variable \(Y\) as a function of one or more independent variables (\(X\)) and an error term \(\epsilon\)
(Ordinary) Least Squares (OLS/LS) is a popular estimator of the linear model, but not the only estimator
Ridge Regression / Lasso are examples of other estimators of the linear model
From an ML perspective, linear regression is a form of supervised learning while OLS is uses error minimization as measured by the sum of squared residuals
Create a univariate linear relationship using a Monte Carlo simulation
sample <- 200
x <- rnorm(sample, 0, 2)
e <- rnorm(sample, 0 , 10)
y <- 3 + x + e
univariate <- data.frame(y, x)
univariate
plot<-univariate %>%
ggplot(aes(x, y)) +
geom_point()
ggplotly(plot)
Estimating a univariate linear model is equivalent to drawing a straight line through this scatter plot
Can fit any number of linear models to this data - need a criteria for determining the best model
OLS posits that the best model is one that minimizes the error of the model’s predicted \(Y\) and the real \(Y\)
OLS measures model error as the sum of squared resiudals
univariate$pred <- predict(lm(y~x, data=univariate))
univariate$resid <- univariate$y - univariate$pred
resid_plot<-univariate %>%
ggplot(aes(x, y)) +
geom_point()+
geom_smooth(method = "lm", se=F)+
geom_segment(aes(xend=x, yend = pred), color="red")
ggplotly(resid_plot)
## `geom_smooth()` using formula = 'y ~ x'
Write the unknown, but linear, data generating process for \(Y\) as
\[ Y = \beta_0 + \beta_1X + \epsilon \] Our linear model returns estimates of coefficients, \(\beta\), which we denote with \(\hat{}\). We use these estimates to construct the estimate of \(Y\)
\[\hat{Y} = \hat{\beta_0} + \hat{\beta_1}X \] We define the residual \(e\) as the difference between \(Y\) and its estimate \(\hat{Y}\), which we use as an estimate of the unknown disturbance \(\epsilon\). Note that we move to estimates for an individual observation \(i\) below:
\[ e_i = \hat{y}_i - y_i\] Finally, we denote the residual sum of squares (RSS) as the sum of the squared residuals for all observations in the data
\[ RSS = \sum_{i=1}^{n}e_i^2\] OLS estimates the coefficients \(\beta\) that minimize \(RSS\). For the simple bivariate case the coefficient on \(X\) is simply the ratio of it’s covariance with \(Y\) over its variance:
\[\hat{\beta_1} = \frac{\sum_{i=1}^{n}(x_i - \overline{x})(y_i - \overline{y})}{\sum_{i=1}^{n}(x_i - \overline{x})^2}\] The constant is simply the difference between the mean of \(Y\) and the predicted \(Y\) at the mean of \(X\)
\[\hat{\beta_0} = \overline{y} - \hat{\beta_1}\overline{x}\] Given that the coefficient estimates that minimize \(RSS\) are known there is no need to perform model training or tuning to estimate them - they can be calculated directly from the data
coef(lm(y~x, data=univariate))
## (Intercept) x
## 2.602079 1.001037
mean_x <- mean(x)
mean_y <- mean(y)
beta1_hat <-(sum((univariate$x - mean_x)*(univariate$y - mean_y)))/(sum((univariate$x-mean_x)^2))
beta0_hat <- mean_y - beta1_hat*mean_x
print(c(beta0_hat, beta1_hat))
## [1] 2.602079 1.001037
Under certain assumptions, OLS is the ‘best’ estimator for the linear model
Zero conditional mean assumption (along with some other technical assumptions) is sufficient to ensure unbiasedness (\(E(\hat{\beta})=\beta\)):
\[E(\epsilon|X) = 0 \] Note that this is an assumption about the unobserved error \(\epsilon\) and cannot be tested - OLS will fit residual \(e\) that is mean zero whether or not it’s appropriate
The additional assumption of Homoskedastic and Serially Uncorrelated Errors makes OLS the Minimum Variance Unbiased Linear Estimator
\[Var(\epsilon | X) = \sigma^2 \] \[Corr(\epsilon_t, \epsilon_s) = 0, \forall t \neq s \] Informally, these two assumptions ensure that of all the unbiased linear estimators of the linear model, OLS has the smallest variance
This does not mean OLS is the absolute best estimator in any circumstance
If the standard errors satisfy the above assumptions, but are not normally distributed, then non-linear unbiased estimators may offer improvement, such as trimmed least squares
Variance may be reduced by using a biased estimator, such as ridge regression
OLS is unbaised, but any individual estimate of \(\hat\beta\) may be far from \(\beta\)
The variance of \(\hat\beta_1\) is a function of the variance of the error and the variance of \(X\)
Higher variance in the error makes estimation less precise while higher variance in \(X\) makes it more precise:
\[ Var(\hat\beta_1) = \frac{\sigma^2}{\sum_{i=1}^n(x_i-\overline{x})^2} \] Denominator is known as a function of the data - numerator is unknown and must be estimated
An unbiased estimator of \(\sigma^2\) is \(RSS/(n-2)\) where \(n-2\) is a degrees of freedom adjustment due to constraints of OLS
For test statistics, we work with the standard error of the coefficient, as opposed to the variance
\[se(\hat\beta_1) = \sqrt\frac{\sigma^2}{\sum_{i=1}^n(x_i-\overline{x})^2} = \sqrt\frac{RSS/(n-2)}{\sum_{i=1}^n(x_i-\overline{x})^2} \]
Consider a linear model where \(Y\) is a function of two independent variables:
\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon\]
x1 <- rnorm(sample, 0, 5)
x2 <- 2*x1 + rnorm(sample, 0, 5)
y <- 2*x1-x2+rnorm(sample, 0,5)
multivariate <- data.frame(x1, x2, y)
multi_plot<-plot_ly(multivariate, x = ~x1, y = ~x2, z = ~y)
multi_plot<-multi_plot %>%
add_markers()
multi_plot
Estimation of this linear model via OLS is similar
Recall OLS minimizes \(RSS\)
\[ RSS = \sum_{i=1}^{n}e_i^2\] \[ = \sum_{i=1}^{n}(\hat{y_i} - y_i)^2 \] \[ = \sum_{i=1}^{n}((\hat{\beta_0} + \hat{\beta_1}X_{1i} + \hat{\beta_2}X_{2i}) - y_i)^2 \] When we model \(Y\) as a function of \(X_2\) OLS minimizes \(RSS\) over both \(X\)
The formulas for coefficients of multivariate regression are more complicated, but one representation:
\[\hat\beta_1 = \frac{\sum_{i=1}^{n}\hat{r_{i1}}y_i}{\sum_{i=1}^{n}(\hat{r_{i1}})^2} \] Where \(\hat{r_{i1}}\) is the residual from the auxiliary regression of \(X_1\) on \(X_2\)
\[ X_1 = \hat\gamma_0+\hat\gamma_1{X_2} + \hat{r_1} \] Note that our new estimate for \(\hat\beta_1\) is identical to our old estimate except that \(Cov(\hat{r}, y)\) and \(Var(\hat{r})\) have replaced \(Cov(x, y)\) and \(Var(x)\) respectively
The specific form of these variances and covariances is slightly different - note no means! We’ve applied two simplifications to drop means
\[ E[\hat{r}] = 0 \]
\[ Cov(x, y) = \sum_{i=1}^{n}(x_i - \overline{x})(y_i - \overline{y}) = \sum_{i=1}^{n}(x_i - \overline{x})y_i \]
multivariate$resid <- residuals(lm(x1~x2, multivariate))
partial <- lm(y~resid, multivariate)
full <- lm(y~x1+x2, multivariate)
print(c(coef(partial), coef(full)))
## (Intercept) resid (Intercept) x1 x2
## 0.1388380 1.7082292 0.3262511 1.7082292 -0.8831974
short <- lm(y~x1, multivariate)
print(coef(short))
## (Intercept) x1
## 0.17590624 -0.06440515
aux <- lm(x2~x1, multivariate)
coef(full)[2] + coef(full)[3]*coef(aux)[2]
## x1
## -0.06440515
This shows the relationship between the coefficients of the correct regression with \(X_2\) and what is called the ‘short’ regression (because it’s short \(X_2\)) where \(Y\) is regressed on \(X\) alone
\[ \tilde\beta_1 = \hat{\beta_1} + \hat{\beta_2}\hat{\gamma_1}\] Where \(\tilde{\beta_1}\) is the coefficient on \(X_1\) from the short regression. The second term is the omitted variable bias from the exclusion of \(X_2\) - the effect of \(X_2\) on \(Y\) (controlling for \(X_1\)) as well as the effect of \(X_2\) on \(X_1\) from the auxiliary regression (\(\hat{\gamma_1}\))
Can \(\tilde{\beta_1} = \hat{\beta_1}\)?
Fundamental question: does \(X\) effect \(Y\)?
Frequentist inference begins by assuming the opposite. We refer to this as the null hypothesis:
\(H_0\): There is no relationship between \(X\) and \(Y\)
Under the null hypothesis - what do we expect \(\hat\beta_1\) to be?
Statistically, we want to test how likely our estimate of \(\hat\beta_1\) under the null hypothesis - in other words if the effect of \(X\) is zero, how unlikely is our estimate?
This depends on the coefficient and it’s variance, which we use to construct a t-statistic:
\[t = \frac{\hat\beta_1 - \beta_{1|H_0}}{se(\hat\beta_1)} \]
Under the normality assumption:
\[ \epsilon \sim Normal(0, \sigma^2) \]
The t-statistic follows a t-distribution with n-k-1 degrees of freedom. Knowing this we can calculate a p-value which represents the probability that we could draw a test statistic at least as large as \(T\) (one tailed test) or at least as large as \(|T|\) (two tailed)
Note that normality implies homoskedasticity
univariate$y_hat = beta0_hat + beta1_hat * univariate$x
rss = sum((univariate$y_hat - univariate$y)^2)
se_beta1 <- sqrt(
(rss/(sample-2))/
sum((univariate$x - mean_x)^2)
)
t <- beta1_hat/se_beta1
p <- 2*pt(abs(t), df=sample-2, lower=F)
print(c(t, p))
## [1] 2.863327407 0.004643621
lm <- lm(y~x, data=univariate)
summary(lm)
##
## Call:
## lm(formula = y ~ x, data = univariate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.6443 -5.9501 -0.2313 6.5272 29.9443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.6021 0.7269 3.580 0.000432 ***
## x 1.0010 0.3496 2.863 0.004644 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.28 on 198 degrees of freedom
## Multiple R-squared: 0.03976, Adjusted R-squared: 0.03491
## F-statistic: 8.199 on 1 and 198 DF, p-value: 0.004644
plot(lm)
TSCS data have observations of multiple units N over time T.
Each specific observation within the data is of one unit at one time, e.g.
data<-read.dta13("ms_mil_spend_data.dta")
data %>%
filter(ccode==2, year==2000)
This observation is for the US (2) for the year 2000.
data %>%
dplyr::arrange(ccode, year) %>%
dplyr::select(ccode, year, LMILEX, v2x_polyarchy)
data %>%
dplyr::filter(ccode %in% c(2,20)) %>%
ggplot(aes(x=year, y=LMILEX)) +
geom_line() +
facet_wrap(~ccode)
summary(data$LMILEX)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.4612 5.1331 6.7051 6.6764 8.1738 13.1076
#Between unit variation
data %>%
group_by(ccode) %>%
summarize(between=mean(na.omit(LMILEX)))%>%
dplyr::select(between) %>%
ungroup() %>%
summary()
## between
## Min. :-0.2521
## 1st Qu.: 4.9785
## Median : 6.3479
## Mean : 6.4185
## 3rd Qu.: 7.8168
## Max. :12.3721
#Within unit variation
data %>%
group_by(ccode) %>%
mutate(between=mean(na.omit(LMILEX)),
within=LMILEX-mean(na.omit(LMILEX))) %>%
dplyr::select(within) %>%
ungroup() %>%
summary()
## Adding missing grouping variables: `ccode`
## ccode within
## Min. : 2.0 Min. :-6.95017
## 1st Qu.:230.0 1st Qu.:-0.42423
## Median :451.0 Median : 0.04347
## Mean :451.9 Mean : 0.00000
## 3rd Qu.:660.0 3rd Qu.: 0.48605
## Max. :950.0 Max. : 2.38023
The simplest approach to estimating a model on TSCS is to pool the data:
\(Y_{it}=\beta_0+\beta_1X_{it}+e_{it}\)
Where Y is the DV, X is the IV, \(\beta_0\) is the intercept, \(\beta_1\) is the coefficient on X and \(e_{it}\) is the error.
What are we assuming here?
All of the \(N\) units share a common intercept \(\beta_0\) and a common slope \(\beta_1\) on X.
Standard OLS assumptions regarding the error term (specifically zero conditional mean and white noise errors).
Whether pooling is appropriate depends on whether those assumptions are correct.
#Monte Carlo simulations to better control the relationships
N<-seq(1,5,1)
Time<-seq(1,50,1)
clean<-expand.grid(N, Time)
names(clean)<-c("N", "Time")
clean<- clean %>%
dplyr::arrange(N, Time)
tibble(clean)
#Draw the IV, DV, error
clean$X<-rnorm(nrow(clean), 1, .25)
clean$Y<-3-2*clean$X+rnorm(nrow(clean), 0, .5)
tibble(clean)
#Run the pooled model
clean_pool<-lm(Y~X, data=clean)
summary(clean_pool)
##
## Call:
## lm(formula = Y ~ X, data = clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.26680 -0.30666 0.03189 0.28070 1.11018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0404 0.1188 25.60 <2e-16 ***
## X -2.0432 0.1180 -17.32 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4599 on 248 degrees of freedom
## Multiple R-squared: 0.5475, Adjusted R-squared: 0.5457
## F-statistic: 300 on 1 and 248 DF, p-value: < 2.2e-16
#Plot to understand why this worked
clean %>%
ggplot(aes(x=X, y=Y))+
geom_point(aes(color=as.factor(N)))+
geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula = 'y ~ x'
Pooled model works here because there is no difference in the underlying relationship - we know this because we drew X without respect to N, Time
Let’s assume instead that there exists some unit-level heterogeneity. Some \(Z\) affects both \(Y\) and \(X\) but is time constant.
You may recognize this as the setup for omitted variable bias.
#Monte Carlo simulations to better control the relationships
N<-seq(1,5,1)
Time<-seq(1,50,1)
dirty<-expand.grid(N, Time)
names(dirty)<-c("N", "Time")
dirty<- dirty %>%
dplyr::arrange(Time, N)
#Draw the unobserved effect, IV, DV, error
Z<-rnorm(length(unique(dirty$N)), 0, .5)
dirty$Z<-rep(Z, length(Time))
dirty$X<-dirty$Z+rnorm(nrow(dirty), 1, .25)
dirty$Y<-3-2*dirty$X+4*dirty$Z+rnorm(nrow(dirty),0, .5)
#Run the pooled model
dirty_pool<-lm(Y~X, data=dirty)
summary(dirty_pool)
##
## Call:
## lm(formula = Y ~ X, data = dirty)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3102 -0.6600 0.0224 0.6095 2.3569
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1678 0.1494 1.124 0.262
## X 1.1596 0.0947 12.245 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.006 on 248 degrees of freedom
## Multiple R-squared: 0.3768, Adjusted R-squared: 0.3743
## F-statistic: 149.9 on 1 and 248 DF, p-value: < 2.2e-16
#Plot the pooled model - what are we looking at?
dirty %>%
ggplot(aes(x=X, y=Y))+
geom_point()+
geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula = 'y ~ x'
dirty %>%
ggplot(aes(x=X, y=Y))+
geom_point(aes(color=as.factor(N)))+
geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula = 'y ~ x'
#Plot the per unit regressions
dirty %>%
ggplot(aes(x=X, y=Y, color=as.factor(N)))+
geom_point()+
geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula = 'y ~ x'
In this case, pooling estimates a weak positive relationship between X and Y, despite the fact that the true relationship is negative.
This occurs because there is omitted variable bias introduced by Z.
Note that pooling would be fine if we could estimate \(Y_{it}=\beta_1X_{it}+\beta_2Z_i+e_{it}\)
With TSCS data that contains time-constant unit-level omitted variable bias, fixed effects can be used to estimate the unbiased relationship, even if the omitted variable \(Z\) is not observed.
fe<-plm(Y~X, dirty, index=c("N", "Time"), method="within")
summary(fe)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = Y ~ X, data = dirty, index = c("N", "Time"), method = "within")
##
## Balanced Panel: n = 5, T = 50, N = 250
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.201097 -0.316239 0.030415 0.292112 1.389968
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## X -2.05045 0.11443 -17.918 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 120.99
## Residual Sum of Squares: 52.246
## R-Squared: 0.56819
## Adj. R-Squared: 0.55934
## F-statistic: 321.059 on 1 and 244 DF, p-value: < 2.22e-16
Why does this work?
The fixed effect estimator considers only the within-unit variation, and removes the between unit variation.
In other words, the data are group demeaned- each observation for each i is subtracted from the mean for that unit over time.
Because the effect of Z is constant over time, demeaning removes its effect and the bias it introduces
We can do this manually:
demean<-dirty %>%
dplyr::arrange(N,Time)%>%
dplyr::group_by(N) %>%
dplyr::mutate(mean_X=mean(X),
mean_Y=mean(Y),
demean_X=X-mean_X,
demean_Y=Y-mean_Y)
demean
fe_manual<-lm(demean_Y~demean_X-1, data=demean)
summary(fe_manual)
##
## Call:
## lm(formula = demean_Y ~ demean_X - 1, data = demean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20110 -0.31624 0.03042 0.29211 1.38997
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## demean_X -2.0504 0.1133 -18.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4581 on 249 degrees of freedom
## Multiple R-squared: 0.5682, Adjusted R-squared: 0.5665
## F-statistic: 327.6 on 1 and 249 DF, p-value: < 2.2e-16
demean %>%
ggplot(aes(x=demean_X, y=demean_Y))+
geom_point(aes(color=as.factor(N)))+
geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula = 'y ~ x'
The within estimator is typically introduced with the dummy variable notation - where N indicators for each unit (or N-1 and the intercept). These are equivalent in OLS (not in GLMs). Most software (plm in R, xtreg, fe in Stata) uses the within transform
fe_dummy <-
lm(Y~X+as.factor(N)-1, data=dirty)
summary(fe_dummy)
##
## Call:
## lm(formula = Y ~ X + as.factor(N) - 1, data = dirty)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20110 -0.31624 0.03042 0.29211 1.38997
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## X -2.05045 0.11443 -17.918 <2e-16 ***
## as.factor(N)1 7.06019 0.24521 28.793 <2e-16 ***
## as.factor(N)2 4.81727 0.18019 26.734 <2e-16 ***
## as.factor(N)3 4.04904 0.15350 26.378 <2e-16 ***
## as.factor(N)4 7.07162 0.24051 29.402 <2e-16 ***
## as.factor(N)5 0.74980 0.07785 9.631 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4627 on 244 degrees of freedom
## Multiple R-squared: 0.9576, Adjusted R-squared: 0.9566
## F-statistic: 919.4 on 6 and 244 DF, p-value: < 2.2e-16
summary(fe_manual)
##
## Call:
## lm(formula = demean_Y ~ demean_X - 1, data = demean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20110 -0.31624 0.03042 0.29211 1.38997
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## demean_X -2.0504 0.1133 -18.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4581 on 249 degrees of freedom
## Multiple R-squared: 0.5682, Adjusted R-squared: 0.5665
## F-statistic: 327.6 on 1 and 249 DF, p-value: < 2.2e-16
summary(fe)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = Y ~ X, data = dirty, index = c("N", "Time"), method = "within")
##
## Balanced Panel: n = 5, T = 50, N = 250
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.201097 -0.316239 0.030415 0.292112 1.389968
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## X -2.05045 0.11443 -17.918 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 120.99
## Residual Sum of Squares: 52.246
## R-Squared: 0.56819
## Adj. R-Squared: 0.55934
## F-statistic: 321.059 on 1 and 244 DF, p-value: < 2.22e-16
Note that the standard errors are slightly different because the demeaned model doesn’t have the right degrees of freedom. You should use plm() for the within transform as it corrects for this.
Interpretation is standard, but with a caveat
summary(fe)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = Y ~ X, data = dirty, index = c("N", "Time"), method = "within")
##
## Balanced Panel: n = 5, T = 50, N = 250
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.201097 -0.316239 0.030415 0.292112 1.389968
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## X -2.05045 0.11443 -17.918 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 120.99
## Residual Sum of Squares: 52.246
## R-Squared: 0.56819
## Adj. R-Squared: 0.55934
## F-statistic: 321.059 on 1 and 244 DF, p-value: < 2.22e-16
sd(dirty$X)
## [1] 0.6731207
coef(fe)*sd(dirty$X)
## X
## -1.380197
sd(demean$demean_X)
## [1] 0.2562567
coef(fe)*sd(demean$demean_X)
## X
## -0.5254403
Takeaway: Coefficients on Linear FE regressors are correctly interpreted as the effect of a one unit increase in the within unit X. The within unit variance is always no larger than the total variance in X (often significantly smaller, here about 1/2). Calculate substantive effects using within X. (Mummolo and Peterson (2018))
time_inv<-dirty
time_inv$K<-ifelse(dirty$N<=2, 1, 0)
time_inv$Y<-3-2*time_inv$X+4*time_inv$Z-3*time_inv$K+rnorm(nrow(time_inv),0, .5)
drop_model<-plm(Y~X+K, model="within", index = c("N", "Time"), data=time_inv)
summary(drop_model)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = Y ~ X + K, data = time_inv, model = "within", index = c("N",
## "Time"))
##
## Balanced Panel: n = 5, T = 50, N = 250
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.378685 -0.340800 -0.001684 0.362874 1.413960
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## X -2.13107 0.12399 -17.188 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 135.59
## Residual Sum of Squares: 61.334
## R-Squared: 0.54766
## Adj. R-Squared: 0.53839
## F-statistic: 295.418 on 1 and 244 DF, p-value: < 2.22e-16
#Why not?
time_inv %>%
group_by(N) %>%
mutate(var_k=K-mean(K)) %>%
ungroup() %>%
select(var_k) %>%
summary()
## var_k
## Min. :0
## 1st Qu.:0
## Median :0
## Mean :0
## 3rd Qu.:0
## Max. :0
The FE estimator sweeps out all variation on K, because K only has between-unit variance, not within unit variance.
data %>%
group_by(ccode) %>%
summarize(var_dem=var(na.omit(vdem_dem)))
logit<-clogit(vdem_dem~LMILEX+strata(ccode), data=data)
summary(logit)
## Call:
## coxph(formula = Surv(rep(1, 6173L), vdem_dem) ~ LMILEX + strata(ccode),
## data = data, method = "exact")
##
## n= 6147, number of events= 1991
## (26 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z Pr(>|z|)
## LMILEX 0.80977 2.24739 0.08244 9.823 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## LMILEX 2.247 0.445 1.912 2.641
##
## Concordance= 0.656 (se = 0.018 )
## Likelihood ratio test= 109.2 on 1 df, p=<2e-16
## Wald test = 96.49 on 1 df, p=<2e-16
## Score (logrank) test = 102.9 on 1 df, p=<2e-16
A lot has been made of this issue, but does it matter? (Cook, Hays, and Franzese (2020))
“Feels” problematic that the model ignores a portion of the data. Can be severe if the event in question is rare/common.
Concrete issues:
No estimates of FE, so no individual MEs (similar to Cox PH model).
Biased estimates of baseline rate of event because non-event units are removed. Correct interpretation is baseline rate for units that experience the event.
Cook, Hays, and Franzese (2020) recast the unconditional (unit-dummy) FE problem as one of separation.
Separation occurs when a variable perfectly predicts the DV- here the unit does. (Imagine a table of Y and N, if one or more cells are empty, this is separation)
ML solutions do not exist under separation- the LL is flat because fit can always be improved by pushing the separating coefficient more towards negative (positive) infinity.
Well known solution is to penalize the maximum likelihood to punish large coefficients (may be familiar as ridge/lasso, but specific penalty is different.)
penalized_fe<-glm(vdem_dem~LMILEX+as.factor(ccode), data=data,
familty="binomial"(link="logit"), method="brglmFit")
summary(penalized_fe)
##
## Call:
## glm(formula = vdem_dem ~ LMILEX + as.factor(ccode), data = data,
## method = "brglmFit", familty = binomial(link = "logit"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.94595 -0.06031 -0.00950 0.01896 1.02400
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.491385 0.063109 7.786 6.90e-15 ***
## LMILEX 0.041110 0.004237 9.703 < 2e-16 ***
## as.factor(ccode)20 0.126568 0.051389 2.463 0.013780 *
## as.factor(ccode)40 -0.792386 0.055943 -14.164 < 2e-16 ***
## as.factor(ccode)41 -0.678995 0.060365 -11.248 < 2e-16 ***
## as.factor(ccode)42 -0.443418 0.058097 -7.632 2.31e-14 ***
## as.factor(ccode)51 0.138374 0.062976 2.197 0.028002 *
## as.factor(ccode)52 0.322329 0.064463 5.000 5.73e-07 ***
## as.factor(ccode)70 -0.694818 0.054017 -12.863 < 2e-16 ***
## as.factor(ccode)90 -0.636491 0.057657 -11.039 < 2e-16 ***
## as.factor(ccode)91 -0.490645 0.059051 -8.309 < 2e-16 ***
## as.factor(ccode)92 -0.667198 0.058148 -11.474 < 2e-16 ***
## as.factor(ccode)93 -0.454701 0.059681 -7.619 2.56e-14 ***
## as.factor(ccode)94 0.330924 0.060281 5.490 4.03e-08 ***
## as.factor(ccode)95 -0.430687 0.062181 -6.926 4.32e-12 ***
## as.factor(ccode)100 -0.588261 0.054285 -10.837 < 2e-16 ***
## as.factor(ccode)101 -0.037458 0.053911 -0.695 0.487172
## as.factor(ccode)110 -0.552961 0.066478 -8.318 < 2e-16 ***
## as.factor(ccode)130 -0.321851 0.056378 -5.709 1.14e-08 ***
## as.factor(ccode)135 -0.561296 0.054577 -10.284 < 2e-16 ***
## as.factor(ccode)140 -0.569250 0.052077 -10.931 < 2e-16 ***
## as.factor(ccode)145 -0.405796 0.058115 -6.983 2.90e-12 ***
## as.factor(ccode)150 -0.540675 0.058539 -9.236 < 2e-16 ***
## as.factor(ccode)155 -0.307285 0.053527 -5.741 9.43e-09 ***
## as.factor(ccode)160 -0.409081 0.052735 -7.757 8.67e-15 ***
## as.factor(ccode)165 -0.006511 0.057142 -0.114 0.909276
## as.factor(ccode)200 0.081618 0.050413 1.619 0.105447
## as.factor(ccode)205 0.252106 0.056087 4.495 6.96e-06 ***
## as.factor(ccode)210 0.149191 0.052030 2.867 0.004138 **
## as.factor(ccode)211 0.164498 0.052517 3.132 0.001735 **
## as.factor(ccode)212 0.310687 0.059126 5.255 1.48e-07 ***
## as.factor(ccode)220 0.083969 0.050454 1.664 0.096057 .
## as.factor(ccode)225 0.180438 0.053070 3.400 0.000674 ***
## as.factor(ccode)230 -0.371570 0.052650 -7.057 1.70e-12 ***
## as.factor(ccode)235 -0.305282 0.053604 -5.695 1.23e-08 ***
## as.factor(ccode)260 0.085662 0.051826 1.653 0.098351 .
## as.factor(ccode)265 -0.869724 0.057335 -15.169 < 2e-16 ***
## as.factor(ccode)290 -0.646118 0.051585 -12.525 < 2e-16 ***
## as.factor(ccode)305 0.206396 0.055321 3.731 0.000191 ***
## as.factor(ccode)310 -0.595603 0.053288 -11.177 < 2e-16 ***
## as.factor(ccode)315 -0.791519 0.053903 -14.684 < 2e-16 ***
## as.factor(ccode)317 0.223462 0.102906 2.172 0.029891 *
## as.factor(ccode)325 0.116907 0.051145 2.286 0.022267 *
## as.factor(ccode)339 -0.742123 0.056365 -13.166 < 2e-16 ***
## as.factor(ccode)343 -0.562525 0.104791 -5.368 7.96e-08 ***
## as.factor(ccode)344 -0.682631 0.096692 -7.060 1.67e-12 ***
## as.factor(ccode)345 -0.831925 0.052877 -15.733 < 2e-16 ***
## as.factor(ccode)346 -0.793540 0.149255 -5.317 1.06e-07 ***
## as.factor(ccode)349 0.257207 0.098280 2.617 0.008868 **
## as.factor(ccode)350 -0.304609 0.052890 -5.759 8.45e-09 ***
## as.factor(ccode)352 -0.031107 0.060802 -0.512 0.608916
## as.factor(ccode)355 -0.614995 0.053237 -11.552 < 2e-16 ***
## as.factor(ccode)359 0.320464 0.095860 3.343 0.000829 ***
## as.factor(ccode)360 -0.639897 0.052373 -12.218 < 2e-16 ***
## as.factor(ccode)365 -0.965736 0.049728 -19.420 < 2e-16 ***
## as.factor(ccode)366 0.196969 0.095428 2.064 0.039011 *
## as.factor(ccode)367 0.305745 0.095348 3.207 0.001343 **
## as.factor(ccode)368 0.293513 0.094939 3.092 0.001991 **
## as.factor(ccode)369 -0.409989 0.091233 -4.494 6.99e-06 ***
## as.factor(ccode)370 -0.342508 0.092631 -3.698 0.000218 ***
## as.factor(ccode)371 -0.478555 0.098685 -4.849 1.24e-06 ***
## as.factor(ccode)372 -0.724568 0.098803 -7.333 2.24e-13 ***
## as.factor(ccode)373 -0.754047 0.093493 -8.065 7.30e-16 ***
## as.factor(ccode)375 0.213910 0.054375 3.934 8.36e-05 ***
## as.factor(ccode)380 0.156259 0.052250 2.991 0.002784 **
## as.factor(ccode)385 0.186786 0.053303 3.504 0.000458 ***
## as.factor(ccode)390 0.191175 0.053468 3.575 0.000350 ***
## as.factor(ccode)395 0.496515 0.071337 6.960 3.40e-12 ***
## as.factor(ccode)404 -0.610757 0.076064 -8.029 9.79e-16 ***
## as.factor(ccode)411 -0.609188 0.079303 -7.682 1.57e-14 ***
## as.factor(ccode)420 -0.517902 0.072629 -7.131 9.98e-13 ***
## as.factor(ccode)432 -0.495354 0.061361 -8.073 6.87e-16 ***
## as.factor(ccode)433 -0.381435 0.060785 -6.275 3.49e-10 ***
## as.factor(ccode)434 -0.462723 0.063173 -7.325 2.39e-13 ***
## as.factor(ccode)435 -0.681405 0.062109 -10.971 < 2e-16 ***
## as.factor(ccode)436 -0.556325 0.063513 -8.759 < 2e-16 ***
## as.factor(ccode)437 -0.717136 0.060244 -11.904 < 2e-16 ***
## as.factor(ccode)438 -0.700073 0.063304 -11.059 < 2e-16 ***
## as.factor(ccode)439 -0.631367 0.062111 -10.165 < 2e-16 ***
## as.factor(ccode)450 -0.661676 0.062087 -10.657 < 2e-16 ***
## as.factor(ccode)451 -0.654075 0.063953 -10.227 < 2e-16 ***
## as.factor(ccode)452 -0.570108 0.060096 -9.487 < 2e-16 ***
## as.factor(ccode)461 -0.663404 0.063109 -10.512 < 2e-16 ***
## as.factor(ccode)471 -0.714888 0.060356 -11.844 < 2e-16 ***
## as.factor(ccode)475 -0.780829 0.057356 -13.614 < 2e-16 ***
## as.factor(ccode)481 -0.693590 0.062465 -11.104 < 2e-16 ***
## as.factor(ccode)482 -0.654781 0.063602 -10.295 < 2e-16 ***
## as.factor(ccode)483 -0.685601 0.061882 -11.079 < 2e-16 ***
## as.factor(ccode)484 -0.698266 0.061869 -11.286 < 2e-16 ***
## as.factor(ccode)490 -0.729608 0.059963 -12.168 < 2e-16 ***
## as.factor(ccode)500 -0.725594 0.060864 -11.922 < 2e-16 ***
## as.factor(ccode)501 -0.739933 0.060192 -12.293 < 2e-16 ***
## as.factor(ccode)510 -0.595615 0.060243 -9.887 < 2e-16 ***
## as.factor(ccode)516 -0.683478 0.062648 -10.910 < 2e-16 ***
## as.factor(ccode)517 -0.684084 0.062616 -10.925 < 2e-16 ***
## as.factor(ccode)520 -0.696050 0.062708 -11.100 < 2e-16 ***
## as.factor(ccode)522 -0.674614 0.071885 -9.385 < 2e-16 ***
## as.factor(ccode)530 -0.761208 0.057546 -13.228 < 2e-16 ***
## as.factor(ccode)531 -0.752980 0.103476 -7.277 3.42e-13 ***
## as.factor(ccode)540 -0.780157 0.068483 -11.392 < 2e-16 ***
## as.factor(ccode)541 -0.718873 0.104396 -6.886 5.74e-12 ***
## as.factor(ccode)551 -0.697581 0.061255 -11.388 < 2e-16 ***
## as.factor(ccode)552 -0.784212 0.061553 -12.740 < 2e-16 ***
## as.factor(ccode)553 -0.522445 0.064569 -8.091 5.91e-16 ***
## as.factor(ccode)560 -0.699541 0.053071 -13.181 < 2e-16 ***
## as.factor(ccode)565 0.187941 0.091064 2.064 0.039033 *
## as.factor(ccode)570 -0.613353 0.068116 -9.005 < 2e-16 ***
## as.factor(ccode)571 0.352672 0.066129 5.333 9.65e-08 ***
## as.factor(ccode)572 -0.644466 0.067142 -9.599 < 2e-16 ***
## as.factor(ccode)580 -0.522315 0.061258 -8.526 < 2e-16 ***
## as.factor(ccode)581 -0.601113 0.123607 -4.863 1.16e-06 ***
## as.factor(ccode)590 0.394685 0.069426 5.685 1.31e-08 ***
## as.factor(ccode)600 -0.798528 0.055410 -14.411 < 2e-16 ***
## as.factor(ccode)615 -0.802669 0.057222 -14.027 < 2e-16 ***
## as.factor(ccode)616 -0.749549 0.058378 -12.839 < 2e-16 ***
## as.factor(ccode)620 -0.789606 0.057006 -13.851 < 2e-16 ***
## as.factor(ccode)625 -0.755617 0.057214 -13.207 < 2e-16 ***
## as.factor(ccode)630 -0.842391 0.052533 -16.035 < 2e-16 ***
## as.factor(ccode)640 -0.303084 0.052270 -5.798 6.70e-09 ***
## as.factor(ccode)645 -0.823094 0.052944 -15.546 < 2e-16 ***
## as.factor(ccode)651 -0.842790 0.052280 -16.121 < 2e-16 ***
## as.factor(ccode)652 -0.794184 0.054767 -14.501 < 2e-16 ***
## as.factor(ccode)660 -0.735204 0.060046 -12.244 < 2e-16 ***
## as.factor(ccode)663 -0.767191 0.055194 -13.900 < 2e-16 ***
## as.factor(ccode)666 0.166252 0.052576 3.162 0.001566 **
## as.factor(ccode)670 -0.856081 0.052114 -16.427 < 2e-16 ***
## as.factor(ccode)678 -0.749048 0.058401 -12.826 < 2e-16 ***
## as.factor(ccode)680 -0.733856 0.069248 -10.598 < 2e-16 ***
## as.factor(ccode)690 -0.811638 0.057688 -14.069 < 2e-16 ***
## as.factor(ccode)692 -0.735508 0.064095 -11.475 < 2e-16 ***
## as.factor(ccode)694 -0.792420 0.065319 -12.131 < 2e-16 ***
## as.factor(ccode)696 -0.800020 0.061563 -12.995 < 2e-16 ***
## as.factor(ccode)698 -0.824992 0.060750 -13.580 < 2e-16 ***
## as.factor(ccode)700 -0.719571 0.059508 -12.092 < 2e-16 ***
## as.factor(ccode)701 -0.754482 0.093480 -8.071 6.97e-16 ***
## as.factor(ccode)702 -0.692250 0.095416 -7.255 4.01e-13 ***
## as.factor(ccode)703 -0.733018 0.094103 -7.790 6.73e-15 ***
## as.factor(ccode)704 -0.784301 0.102730 -7.635 2.26e-14 ***
## as.factor(ccode)705 -0.783171 0.092724 -8.446 < 2e-16 ***
## as.factor(ccode)710 -0.931617 0.050203 -18.557 < 2e-16 ***
## as.factor(ccode)712 -0.500237 0.058532 -8.546 < 2e-16 ***
## as.factor(ccode)713 -0.735913 0.052504 -14.016 < 2e-16 ***
## as.factor(ccode)731 -0.836706 0.054674 -15.304 < 2e-16 ***
## as.factor(ccode)732 -0.566254 0.052833 -10.718 < 2e-16 ***
## as.factor(ccode)740 0.122105 0.051774 2.358 0.018352 *
## as.factor(ccode)750 0.060654 0.051238 1.184 0.236504
## as.factor(ccode)760 -0.661069 0.151804 -4.355 1.33e-05 ***
## as.factor(ccode)770 -0.835396 0.052521 -15.906 < 2e-16 ***
## as.factor(ccode)771 -0.456772 0.062959 -7.255 4.01e-13 ***
## as.factor(ccode)775 -0.775457 0.054829 -14.143 < 2e-16 ***
## as.factor(ccode)780 0.010949 0.057012 0.192 0.847706
## as.factor(ccode)790 -0.679526 0.059889 -11.346 < 2e-16 ***
## as.factor(ccode)800 -0.745020 0.053614 -13.896 < 2e-16 ***
## as.factor(ccode)811 -0.752347 0.060856 -12.363 < 2e-16 ***
## as.factor(ccode)812 -0.688754 0.062032 -11.103 < 2e-16 ***
## as.factor(ccode)816 -0.809494 0.061243 -13.218 < 2e-16 ***
## as.factor(ccode)817 -0.812733 0.068529 -11.860 < 2e-16 ***
## as.factor(ccode)820 -0.803381 0.055519 -14.470 < 2e-16 ***
## as.factor(ccode)830 -0.797968 0.058597 -13.618 < 2e-16 ***
## as.factor(ccode)840 -0.525790 0.054388 -9.667 < 2e-16 ***
## as.factor(ccode)850 -0.755156 0.052529 -14.376 < 2e-16 ***
## as.factor(ccode)900 0.150454 0.052068 2.890 0.003858 **
## as.factor(ccode)910 0.137292 0.068152 2.015 0.043957 *
## as.factor(ccode)920 0.226653 0.054921 4.127 3.68e-05 ***
## as.factor(ccode)940 0.473524 0.083087 5.699 1.20e-08 ***
## as.factor(ccode)950 0.121454 0.068052 1.785 0.074307 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.06176733)
##
## Null deviance: 1346.12 on 6146 degrees of freedom
## Residual deviance: 369.39 on 5981 degrees of freedom
## (26 observations deleted due to missingness)
## AIC: 493.94
##
## Type of estimator: AS_mixed (mixed bias-reducing adjusted score equations)
## Number of Fisher Scoring iterations: 2
Note that the unconditional FE estimator is biased due to the incidental parameters problem, which can be severe in small T. Penalizing the ML does not change this.
Incidental parameters problem is just a special case of well known small sample bias of MLE (MLE is consistent, not unbiased, and consistency relies on asymptotics).
As N grows without T, the number of ‘incidental parameters’ (the unit dummies) grows, so we are using T periods to estimate a greater and greater number of parameters.
What if \(Z\) does not cause \(X\)?
There’s no OVB (recall necessary conditions)
FE estimator is still unbiased
Pooled estimator unbiased, but incorrect SE/t-stats because unit effect creates serial correlation in errors
#Make some data to better control the relationships
N<-seq(1,5,1)
Time<-seq(1,50,1)
random<-expand.grid(N, Time)
names(random)<-c("N", "Time")
random<- random %>%
dplyr::arrange(Time, N)
#Draw the unobserved effect, IV, DV, error
Z<-rnorm(length(unique(random$N)), -2, .5)
random$Z<-rep(Z, length(Time))
random$X<-rnorm(nrow(random), 1, .25)
random$Y<-3-2*random$X+4*random$Z+rnorm(nrow(random),0, .5)
#Run the pooled model
random_pool<-lm(Y~X, data=random)
summary(random_pool)
##
## Call:
## lm(formula = Y ~ X, data = random)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.37207 -0.46639 -0.01573 0.41409 1.76523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.3626 0.1751 -24.92 <2e-16 ***
## X -2.1709 0.1712 -12.68 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6552 on 248 degrees of freedom
## Multiple R-squared: 0.3933, Adjusted R-squared: 0.3909
## F-statistic: 160.8 on 1 and 248 DF, p-value: < 2.2e-16
#Plot the pooled model - what are we looking at?
random %>%
ggplot(aes(x=X, y=Y))+
geom_point(aes(color=as.factor(N)))+
geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula = 'y ~ x'
#Plot unit intercepts/slopes
random %>%
ggplot(aes(x=X, y=Y, color=as.factor(N)))+
geom_point()+
geom_smooth(method="lm", se=F)
## `geom_smooth()` using formula = 'y ~ x'
#RE estimator
re<-plm(Y~X, random, model = "random")
summary(re)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = Y ~ X, data = random, model = "random")
##
## Balanced Panel: n = 5, T = 50, N = 250
##
## Effects:
## var std.dev share
## idiosyncratic 0.2399 0.4898 0.472
## individual 0.2683 0.5180 0.528
## theta: 0.8675
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.286605 -0.337148 0.031001 0.348416 1.176522
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) -4.51709 0.26738 -16.894 < 2.2e-16 ***
## X -2.01547 0.13112 -15.371 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 115.95
## Residual Sum of Squares: 59.377
## R-Squared: 0.48789
## Adj. R-Squared: 0.48582
## Chisq: 236.267 on 1 DF, p-value: < 2.22e-16
#True error
random$error <-random$Y-(3-2*random$X+4*random$Z)
#Get estimated error from pooled model
random$resid<-resid(random_pool)
random %>%
dplyr::select(error, resid) %>%
pivot_longer(cols = everything()) %>%
ggplot()+
geom_histogram(aes(value, fill=name, alpha=.2))+
guides(alpha="none")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
e_resid = sum((random$resid)^2)/248
e_error = sum((random$error)^2)/250
e_resid
## [1] 0.4292928
e_error
## [1] 0.2364186
# How does this affect beta se and t-stats?
sqrt(e_resid / sum((random$X - mean(random$X))^2))
## [1] 0.171202
sqrt(e_error / sum((random$X - mean(random$X))^2))
## [1] 0.1270495
summary(random_pool)
##
## Call:
## lm(formula = Y ~ X, data = random)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.37207 -0.46639 -0.01573 0.41409 1.76523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.3626 0.1751 -24.92 <2e-16 ***
## X -2.1709 0.1712 -12.68 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6552 on 248 degrees of freedom
## Multiple R-squared: 0.3933, Adjusted R-squared: 0.3909
## F-statistic: 160.8 on 1 and 248 DF, p-value: < 2.2e-16
summary(re)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = Y ~ X, data = random, model = "random")
##
## Balanced Panel: n = 5, T = 50, N = 250
##
## Effects:
## var std.dev share
## idiosyncratic 0.2399 0.4898 0.472
## individual 0.2683 0.5180 0.528
## theta: 0.8675
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.286605 -0.337148 0.031001 0.348416 1.176522
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) -4.51709 0.26738 -16.894 < 2.2e-16 ***
## X -2.01547 0.13112 -15.371 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 115.95
## Residual Sum of Squares: 59.377
## R-Squared: 0.48789
## Adj. R-Squared: 0.48582
## Chisq: 236.267 on 1 DF, p-value: < 2.22e-16
We can see that our SEs (and by extension t-stats) for the pooled model are incorrect. This is becuase our estimate of \(\sigma^2\) (variance of the error term) is wrong because the pooled estimator includes the effect of \(Z\) on \(Y\) in the variance.
Random effects purge the variance contributed by \(Z\) and therefore return SE that match SE calculated with the known error term.
Since the pooled model is unbiased, we can simply correct the standard error esitmates. Here using cluster-robust SE.
However, cluster-robust SE are asymptotically correct (in the number of units \(N\) not the number of observations). Our N is 5, which is…not asymptotic.
coeftest(random_pool, vcov=vcovHC(random_pool, type = "HC0", group="N"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.36264 0.16725 -26.084 < 2.2e-16 ***
## X -2.17094 0.16827 -12.902 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
These look very similar to our uncorrected SEs.
Make the data bigger.
#Make some data to better control the relationships
N_large<-seq(1,5000,1)
Time<-seq(1,50,1)
random_large<-expand.grid(N_large, Time)
names(random_large)<-c("N", "Time")
random_large<- random_large %>%
dplyr::arrange(Time, N)
#Draw the unobserved effect, IV, DV, error
Z_large<-rnorm(length(unique(random_large$N)), -2, .5)
random_large$Z<-rep(Z_large, length(Time))
random_large$X<-rnorm(nrow(random_large), 1, .25)
random_large$Y<-3-2*random_large$X+4*random_large$Z+rnorm(nrow(random_large),0, .5)
#Run the pooled model
random_pool_large<-lm(Y~X, data=random_large)
summary(random_pool_large, cluster="N")
##
## Call:
## lm(formula = Y ~ X, data = random_large)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.8592 -1.3734 0.0167 1.4012 7.3509
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.97475 0.01707 -291.4 <2e-16 ***
## X -2.02061 0.01656 -122.0 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.072 on 249998 degrees of freedom
## Multiple R-squared: 0.0562, Adjusted R-squared: 0.0562
## F-statistic: 1.489e+04 on 1 and 249998 DF, p-value: < 2.2e-16
large_e_error <- sum((random_large$Y -(2*random_large$X+4*random_large$Z))^2)/(length(N_large))
sqrt(large_e_error / sum((random_large$X - mean(random_large$X))^2))
## [1] 0.08475192
In order to be unbiased, the Random Effects estimator requires that there is no correlation between the unit effects and the explanatory variables.
This assumption is typically tested with a Hausman test comparing an FE and RE model with the same specification
#Fit a FE model to the same data
fe<-plm(Y~X, random, model = "within")
#Hausman Test
phtest(fe, re)
##
## Hausman Test
##
## data: Y ~ X
## chisq = 0.17793, df = 1, p-value = 0.6732
## alternative hypothesis: one model is inconsistent
#Note, order of the arguments does not matter
#Can also calculate the Hausman statistic by hand
t(coef(fe) - coef(re)[2])%*% solve(vcov(fe)-vcov(re)[2,2]) %*% (coef(fe)-coef(re)[2])
## [,1]
## [1,] 0.1779346
The null hypothesis of the Hausman test is (roughly) that the coefficient vectors are equal in both the FE and RE models. Rejection indicates the two vectors are statistically different. Rejection is typically taken to suggest violation of the RE assumption.
Recall that both the FE and RE models are unbiased under the RE assumptions (the former is not efficient). Thus, if the RE assumptions hold, we would not expect a difference. In contrast, if the RE assumption regarding the correlation of the individual effects with the independent variables is violated, only the FE estimator is unbiased and the RE estimator is biased. Therefore, we would expect differences in the estimates.
Rather than choosing between fixed and random effects, we can estimate a compound model.
Recall that the goal of both the FE and RE models is to account for unit-level heterogeneity. The fundamental difference is that the RE model requires no covariance between the unit effects and the error. The FE model is agnostic on this because it eliminates all variation on all variables due to unit effects via the within transform (subtracting the unit means from each realization of \(Y\), \(X\))
The risk of the RE model, then, can be thought of as correlation between these unit means and the error term. As with standard OVB, putting the unit means into the equation will remove the bias and allow the RE estimator to recover the unbiased coefficients on \(X\).
The traditional approach comes from Mundlak (1978) and involves estimating the following equation with the RE estimator:
\(Y_{it}=\beta_1X_{it}+\beta_2\overline{X_i}+e_{it}\)
Where \(\overline{X_i}\) is the unit level mean of \(X\)
Alternatively, the time-specific realizations of \(X\) can be within-transformed prior to estimation.
\(Y_{it}=\beta_1(X_{it}-\overline{X_i})+\beta_2\overline{X_i}+e_{it}\)
The specific model you should use depends on which coefficients are more of interest: the individual or aggregate. The former (Mundlak) specification facilitates interpretation of the aggregate (group mean) variable.
The group demeaning in the latter approach ensures that the individual realizations are uncorrelated with the group-level mean. As a result, the group-level term cannot be interpreted, because it does not control for the effect of the individual realizations.
#We'll fit the second equation to our RE data
#First, we have to group demean
demean_random<-random %>%
dplyr::arrange(N,Time)%>%
group_by(N) %>%
mutate(mean_X=mean(X),
demean_X=X-mean_X)
# Now include both X terms in the RE model
rebw<-plm(Y~demean_X + mean_X, demean_random, model = "random")
coef(rebw)
## (Intercept) demean_X mean_X
## -1.378370 -2.012543 -5.174792
coef(fe)
## X
## -2.012543
Notice that the coefficient on the within effect from the REBW model is the same as the FE estimate.
We can also estimate the Mundlak version
#Mundlak formulation
mundlak<-plm(Y~X + mean_X, demean_random, model = "random")
coef(mundlak)
## (Intercept) X mean_X
## -1.378370 -2.012543 -3.162249
Again, we’ve successfully purged the coefficients on the individual realizations of \(X\). Our coefficient on \(\overline{X}\) has changed. The REWB estimate was -5.2 which we can see here is actually the sum of the between and within coefficients from the Mundlak estimator of -3.2 and -2
#Check how our data look
data %>%
dplyr::select(c(ccode, year, LMILEX, v2x_polyarchy)) %>%
filter(ccode<100) %>%
pivot_longer(cols = c(LMILEX, v2x_polyarchy), values_to = "values")%>%
ggplot(aes(y=values, x=year, color=as.factor(ccode)))+
geom_line()+
facet_wrap(~name, scales="free_y")
Many time series are persistent - current values depend on past values.
Formally, we can define an autoregressive process of order one AR(1) series as one that can be written
\[Y_t = \rho_1Y_{t-1}+\epsilon_t\] where \(\epsilon_t\) is iid. Order one means that \(Y\) is only a function of one past realization.
\(Y\) is stationariy if \(|\rho_1| < 1\). Informally, this means the correlations between \(Y\) and its past realizations is asymptotically 0. Over a long enough time period, the past is irrelevant.
When \(|\rho_1| = 1\) \(Y\) is said to have a unit root. We also call such a series as integrated of order one, or I(1). An I(1) series is only one type of unit root, and determining which kind of unit root is difficult.
Such series are problematic for modeling. First, correlations with the past do not die out, even asymptotically. Second, unit root series have increasing variance over time, meaning the central limit theorem does not hold (test statistics are asymptotically normal), so no hypothesis tests are valid. Third, regressions with non-stationary variables can produce spurious inferences - two integrated series are correlated due to integration, not a causal relationship.
Let’s see:
errors <- rnorm(1001,0,1)
unit_root <- cumsum(errors)
unit_root_data <- data.frame(error=errors, unit_root=unit_root, time=seq(1,1001,1))
unit_root_data <- unit_root_data %>%
mutate(stationary1 = .5*dplyr::lag(error)+rnorm(1,0,1),
stationary2 = .9*dplyr::lag(error)+rnorm(1,0,1),
first_diff = unit_root - dplyr::lag(unit_root))
unit_root_data %>%
dplyr::select(-error) %>%
pivot_longer(-time) %>%
ggplot(aes(y=value, x=time, color=name))+
geom_line()
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_line()`).
You can see that over a long enough time period the persistence of the series is irrelevant - it’s still mean reverting. In contrast, the unit root gets “stuck” at high or low values and stays there. Typically these are less obvious.
I(1) series are difference stationary meaning that if we subtract the prior value, the series is stationary.
\[Y_t = \rho_1Y_{t-1}+\epsilon_t\] \[Y_t = Y_{t-1}+\epsilon_t\] \[Y_t - Y_{t-1} = \epsilon_t\] Typically we denote a differenced series with \(\Delta\), e.g. \(\Delta Y\)
If a series has a unit root, then \(\rho\) = 1. We could formally test this with a regression of \(Y_t\) against its past values, and use a hypothesis test on the coefficient. This is the basis of the Dickey-Fuller test. However, we use different critical values since the central limit theorem is violated by a unit root, and we typically use the Augmented Dickey-Fuller, which includes additional lags to clean up residual serial correlation. Note, that the function we’re using automatically determines these based on series length.
apply(unit_root_data, 2, function(x) tseries::adf.test(na.omit(x)))
## Warning in tseries::adf.test(na.omit(x)): p-value smaller than printed p-value
## Warning in tseries::adf.test(na.omit(x)): p-value greater than printed p-value
## Warning in tseries::adf.test(na.omit(x)): p-value smaller than printed p-value
## Warning in tseries::adf.test(na.omit(x)): p-value smaller than printed p-value
## Warning in tseries::adf.test(na.omit(x)): p-value smaller than printed p-value
## $error
##
## Augmented Dickey-Fuller Test
##
## data: na.omit(x)
## Dickey-Fuller = -10.849, Lag order = 9, p-value = 0.01
## alternative hypothesis: stationary
##
##
## $unit_root
##
## Augmented Dickey-Fuller Test
##
## data: na.omit(x)
## Dickey-Fuller = -1.839, Lag order = 9, p-value = 0.6465
## alternative hypothesis: stationary
##
##
## $time
##
## Augmented Dickey-Fuller Test
##
## data: na.omit(x)
## Dickey-Fuller = 1.7321, Lag order = 9, p-value = 0.99
## alternative hypothesis: stationary
##
##
## $stationary1
##
## Augmented Dickey-Fuller Test
##
## data: na.omit(x)
## Dickey-Fuller = -10.848, Lag order = 9, p-value = 0.01
## alternative hypothesis: stationary
##
##
## $stationary2
##
## Augmented Dickey-Fuller Test
##
## data: na.omit(x)
## Dickey-Fuller = -10.848, Lag order = 9, p-value = 0.01
## alternative hypothesis: stationary
##
##
## $first_diff
##
## Augmented Dickey-Fuller Test
##
## data: na.omit(x)
## Dickey-Fuller = -10.744, Lag order = 9, p-value = 0.01
## alternative hypothesis: stationary
As noted by the test, the null hypothesis is that the series has a unit root. Thus at a small p-value we reject the null and treat the series as stationary.
Panel unit root testing is difficult and R implementations are somewhat basic
We can perform our own test via the p-value aggregation method of Choi (2001)
We perform an augmented Dickey-Fuller test on each panel, aggregate the p-values
Specific aggregation is \(-2\sum_{n}^{N} ln(p_n)\)
Test statistic is distributed \(\chi^2(2N)\)
N here indicates number of panels
We’re going to loop over the panels, perform the test, store the results
(You could also do this as a vectorized function…)
pvals <- NULL
for (i in unique(data$ccode)){
try(dfuller <- adf.test(data[data$ccode==i, "LMILEX"], k = 1))
pvals<-rbind(pvals, dfuller$p.value)
}
## Error in res.sum$coefficients[2, 1] : subscript out of bounds
## Error in res.sum$coefficients[2, 1] : subscript out of bounds
head(pvals)
## [,1]
## [1,] 0.75195137
## [2,] 0.08705546
## [3,] 0.01000000
## [4,] 0.12421690
## [5,] 0.40692124
## [6,] 0.07291725
#Choi's P-value
test_stat<- -2*sum(log(na.omit(pvals)))
test_stat
## [1] 1091.508
#That's going to reject, but let's get a p-value for completeness
#Test is distributed chi2(2N) where N is panels
round(pchisq(test_stat, df=2*nrow(na.omit(pvals)), lower.tail = F), digits=3)
## [1] 0
In the p-value aggregation chase, we are testing a specific null hypothesis that all the panels have a unit root against the alternative that at least one is stationary.
That might not be the most reassuring rejection; other tests have different nulls.
The panel ADF didn’t indicate a need for differencing, but let’s to illustrate.
(Usual caveats here; first differencing is only appropriate for I(1) series, you need to test the differenced series for stationarity, etc.)
#Difference these out to be safe
fd<-pdata.frame(data, index = c("ccode", "year"))
fd$d_lmilex=diff(fd$LMILEX)
fd$d_polyarchy=diff(fd$v2x_polyarchy)
fd %>%
dplyr::select(c(ccode, year, d_lmilex, d_polyarchy)) %>%
filter(ccode %in% seq(2,99,1)) %>%
pivot_longer(cols = c(d_lmilex, d_polyarchy), values_to = "values")%>%
ggplot(aes(y=values, x=year, color=as.factor(ccode)))+
geom_point()+
facet_wrap(~name, scales="free_y")
## Warning: Removed 50 rows containing missing values or values outside the scale range
## (`geom_point()`).
More details in De Boef and Keele (2008)
static<-lm(d_lmilex~d_polyarchy, data=fd)
summary(static)
##
## Call:
## lm(formula = d_lmilex ~ d_polyarchy, data = fd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6651 -0.0847 -0.0078 0.0856 6.2731
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.038012 0.004032 9.428 < 2e-16 ***
## d_polyarchy -0.237173 0.090946 -2.608 0.00913 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.309 on 5919 degrees of freedom
## (252 observations deleted due to missingness)
## Multiple R-squared: 0.001148, Adjusted R-squared: 0.0009789
## F-statistic: 6.801 on 1 and 5919 DF, p-value: 0.009134
#Illustrate the impact of 1 SD change in polyarchy (temporary)
time<-seq(1,10,1)
dem<-c(rep(0,2),
sd(na.omit(fd$d_polyarchy)),
rep(0, 7))
pred_dat<-cbind.data.frame(time,dem)
pred_dat$milex_hat<-pred_dat$dem*coef(static)[2]
pred_dat %>%
ggplot(aes(x=time, y=milex_hat))+
geom_line()
If the regression equation is \(Y_t=\beta_0+\beta_1X_t+\epsilon_t\)
The effect of \(X_t\) on \(Y_t\) is \(\frac{\delta Y_t}{\delta X_t}\) which is \(\beta_1\)
Moving forward one period: \(Y_{t+1}=\beta_0+\beta_1X_{t+1}+\epsilon_{t+1}\)
The derivative of that regression with respect to \(X_t\) is zero.
fd$lag_polyarchy=lag(fd$d_polyarchy)
fd$lag_lmilex=lag(fd$d_lmilex)
lag1<-lm(d_lmilex~d_polyarchy+lag_polyarchy, data=fd)
summary(lag1)
##
## Call:
## lm(formula = d_lmilex ~ d_polyarchy + lag_polyarchy, data = fd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6610 -0.0818 -0.0049 0.0864 3.9774
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.034863 0.003791 9.197 <2e-16 ***
## d_polyarchy -0.211076 0.087885 -2.402 0.0164 *
## lag_polyarchy -0.167615 0.088118 -1.902 0.0572 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2847 on 5707 degrees of freedom
## (463 observations deleted due to missingness)
## Multiple R-squared: 0.002139, Adjusted R-squared: 0.001789
## F-statistic: 6.117 on 2 and 5707 DF, p-value: 0.00222
pred_dat<-pred_dat %>%
mutate(lag_dem=dplyr::lag(dem))
pred_dat$milex_hat<-pred_dat$dem*coef(lag1)[2]+pred_dat$lag_dem*coef(lag1)[3]
pred_dat %>%
ggplot(aes(x=time, y=milex_hat))+
geom_line()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
If the regression equation is \(Y_t=\beta_0+\beta_1X_t+\beta_2X_{t-1}+\epsilon_t\)
The effect of \(X_t\) on \(Y_t\) is \(\frac{\delta}{\delta X} \beta_0+\beta_1X_t+\beta_2X_{t-1}+\epsilon_t\) which is \(\beta_1\)
Moving forward one period: \(Y_{t+1}=\beta_0+\beta_1X_{t+1}+\beta_2X_t+\epsilon_{t+1}\)
The derivative of that regression with respect to \(X_t\) is \(\beta_2\)
Derivative in periods beyond \(t+1\) is zero as before.
ldv<-lm(d_lmilex~lag_lmilex+d_polyarchy, data=fd)
summary(ldv)
##
## Call:
## lm(formula = d_lmilex ~ lag_lmilex + d_polyarchy, data = fd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5186 -0.0860 -0.0057 0.0910 3.9934
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.038110 0.003791 10.053 < 2e-16 ***
## lag_lmilex -0.104232 0.012407 -8.401 < 2e-16 ***
## d_polyarchy -0.284258 0.085060 -3.342 0.000838 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2833 on 5708 degrees of freedom
## (462 observations deleted due to missingness)
## Multiple R-squared: 0.0137, Adjusted R-squared: 0.01336
## F-statistic: 39.65 on 2 and 5708 DF, p-value: < 2.2e-16
pred_dat[1:3,]$milex_hat<-coef(ldv)[3]*pred_dat[1:3,]$dem
pred_dat[4:10,]$milex_hat<-coef(ldv)[2]^(pred_dat[4:10,]$time-3)*coef(ldv)[3]*pred_dat[3,"dem"]
pred_dat %>%
ggplot(aes(x=time, y=milex_hat))+
geom_line()
#What about a permanent shift?
coef(ldv)
## (Intercept) lag_lmilex d_polyarchy
## 0.03810962 -0.10423174 -0.28425773
lrm<-coef(ldv)[3]*sd(na.omit(fd$d_polyarchy))/(1-coef(ldv)[2])
lrm
## d_polyarchy
## -0.01136926
In the case of a temporary shock to X, the effect decays geometrically by assumption (forced by the assumption of the model)
To see that, let’s consider the effect of \(X_t\) at time \(t+1\) and \(t+2\)
The general equation \(Y_{t}=\beta_0+\alpha Y_{t-1}+\beta_1X_{t}+\epsilon_{t}\)
At \(t+1\) we have
\(Y_{t+1}=\beta_0+\alpha Y_{t}+\beta_1X_{t+1}+\epsilon_{t+1}\)
Substitute for \(Y_t\) - drop out the error terms, constant for ease
\(Y_{t+1}=\alpha(\alpha Y_{t-1}+\beta_1X_{t})+\beta_1X_{t+1}\)
First derivative with respect to \(X_t\) is \(\alpha\times\beta_1\)
This makes intuitive sense - \(X_t\) affects \(Y_t\) by \(\beta_1\). In turn, \(Y_t\) affects \(Y_{t+1}\) by \(\alpha\), the extent to which \(Y\) is serially correlated.
Similarly at \(t+2\)
\(Y_{t+2}=\beta_0+\alpha Y_{t+1}+\beta_1X_{t+2}+\epsilon_{t+2}\)
Substitute for \(Y_{t+1}\)
\(Y_{t+2}=\beta_0+\alpha (\alpha Y_{t}+\beta_1X_{t+1})+\beta_1X_{t+2}+\epsilon_{t+2}\)
And again for \(Y_{t}\)
\(Y_{t+2}=\beta_0+\alpha [\alpha(\alpha Y_{t-1}+\beta_1X_{t})+\beta_1X_{t+1}]+\beta_1X_{t+2}+\epsilon_{t+2}\)
This is kind of ugly, but anything that’s not multiplied by \(X_t\) can go away, which is basically everything
We’re left with \(\alpha^2 \times \beta_1\)
Generalizes: the effect of \(X_t\) any arbitrary leads into the future is going to be \(\alpha^d \times \beta_1\) where \(d\) is however many leads we want.
Note that the effect of \(X_t\) decays because of the requirement that \(\vert \alpha \vert < 1\) - otherwise the effect does not decay at all (\(= 1\)) or increases in time (\(>1\)).
Haven’t made too much of this until now.
In general, serial correlation in the error is not a big issue (no bias / inconsistency).
We do get bad SE/t-stats under serial correlation.
Two approaches:
Just use OLS and clean up the SEs to be robust to serial correlation (HAC SEs, PCSEs)
Try to better model the dynamics to eliminate serial correlation - for example with deeper lags of \(Y\)
But, with an LDV, serial correlation in the error creates bias. (Achen (2000), Keele and Kelly (2006))
Need to be careful to ensure no serial correlation in error term of the model.
pbgtest(d_lmilex~lag_lmilex+d_polyarchy, data=fd, model="pooling", order=1)
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel models
##
## data: d_lmilex ~ lag_lmilex + d_polyarchy
## chisq = 0.5923, df = 1, p-value = 0.4415
## alternative hypothesis: serial correlation in idiosyncratic errors
ldv<-plm(d_lmilex~lag_lmilex+d_polyarchy, data=fd, model="pooling")
res_df <- data.frame(residual = residuals(ldv), attr(residuals(ldv), "index"))
res_df<- res_df %>%
left_join(fd, by=c("ccode", "year"))
res_df$lag_res <- lag(res_df$residual)
res_df<-plm::pdata.frame(res_df, index=c("ccode", "year"))
wooldridge <- plm(residual~lag_res, model="pooling", data=res_df)
wooldridge_stat <- (coef(wooldridge)[2]+.5) / (sqrt(diag(vcov(wooldridge))[2]))
round(pchisq(wooldridge_stat, 1, lower.tail = F), digits=3)
## lag_res
## 0
Mixed results - Breusch - Godfrey test fails to reject, Wooldridge test rejects.
If we assume unit effects that impact \(Y\) (as in the FE/RE case), the LDV model is biased, even if we do not assume correlation between the unit effects and included variables (Wawro (2002) is an accessible summary).
This is because the static individual effects are relegated to the error term. The LDV is correlated with those static unit effects, so by putting it on the RHS we’ve created bias.
Unlike in the static case, the FE estimator is also biased. This bias is sometimes referred to as Nickell bias after Nickell (1981).
Intuition here is that the transformed LDV and error are correlated. The group mean of the LDV contains all realizations of \(Y\) minus the final observation of the unit. Any \(Y_t\) is correlated with the appropriate error \(e_t\), which is introduced into the error term by the group difference.
Bias is decreasing in \(T\) not \(N\) - so in short panels the bias can be very large.
First differenced LDV is still correlated with first differenced error
But, this can be handled with standard IV approaches:
Deeper lags of \(Y\) are valid instruments, specifically \(Y_{t-2}\)- no need to find an instrument (although if you have one that also works!). However, this depends on no serial correlation in the errors.
In general, the AH estimator has been superseded by GMM estimators that build on the deeper lag approach, so it’s typically not a named routine. But we can estimate it by hand via standard 2SLS.
fd$l2_lmilex=lag(fd$d_lmilex, 2)
ah<-ivreg(d_lmilex~lag_lmilex+d_polyarchy| l2_lmilex+d_polyarchy, data=fd )
summary(ah)
##
## Call:
## ivreg(formula = d_lmilex ~ lag_lmilex + d_polyarchy | l2_lmilex +
## d_polyarchy, data = fd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.246875 -0.096296 -0.007403 0.099112 4.017816
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.044289 0.006483 6.831 9.34e-12 ***
## lag_lmilex -0.301793 0.153732 -1.963 0.049683 *
## d_polyarchy -0.344988 0.095016 -3.631 0.000285 ***
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 5508 44.269 3.14e-11 ***
## Wu-Hausman 1 5507 1.585 0.208
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2841 on 5508 degrees of freedom
## Multiple R-Squared: -0.0211, Adjusted R-squared: -0.02147
## Wald test: 6.709 on 2 and 5508 DF, p-value: 0.00123
Note that we’ve only used ivreg, which is standard 2SLS estimation (not 2SLS with FE). This is because we’ve swept out the unit effects via first differencing already.